AITopics | device variation

Collaborating Authors

device variation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries

Wen, Bo, Gao, Guoyun, Xu, Zhicheng, Mao, Ruibin, Qi, Xiaojuan, Hu, X. Sharon, Yin, Xunzhao, Li, Can

arXiv.org Artificial IntelligenceJul-17-2025

The rapid advancement of artificial intelligence has raised concerns regarding its trustworthiness, especially in terms of interpretability and robustness. Tree-based models like Random Forest and XGBoost excel in interpretability and accuracy for tabular data, but scaling them remains computationally expensive due to poor data locality and high data dependence. Previous efforts to accelerate these models with analog content addressable memory (CAM) have struggled, due to the fact that the difficult-to-implement sharp decision boundaries are highly susceptible to device variations, which leads to poor hardware performance and vulnerability to adversarial attacks. This work presents a novel hardware-software co-design approach using $MoS_2$ Flash-based analog CAM with inherent soft boundaries, enabling efficient inference with soft tree-based models. Our soft tree model inference experiments on $MoS_2$ analog CAM arrays show this method achieves exceptional robustness against device variation and adversarial attacks while achieving state-of-the-art accuracy. Specifically, our fabricated analog CAM arrays achieve $96\%$ accuracy on Wisconsin Diagnostic Breast Cancer (WDBC) database, while maintaining decision explainability. Our experimentally calibrated model validated only a $0.6\%$ accuracy drop on the MNIST dataset under $10\%$ device threshold variation, compared to a $45.3\%$ drop for traditional decision trees. This work paves the way for specialized hardware that enhances AI's trustworthiness and efficiency.

artificial intelligence, boundary, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.12384

Country:

Asia (0.69)
North America > United States > Wisconsin (0.24)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.87)
Government > Military (0.69)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

NVCiM-PT: An NVCiM-assisted Prompt Tuning Framework for Edge LLMs

Qin, Ruiyang, Ren, Pengyu, Yan, Zheyu, Liu, Liu, Liu, Dancheng, Nassereldine, Amir, Xiong, Jinjun, Ni, Kai, Hu, Sharon, Shi, Yiyu

arXiv.org Artificial IntelligenceNov-12-2024

Large Language Models (LLMs) deployed on edge devices, known as edge LLMs, need to continuously fine-tune their model parameters from user-generated data under limited resource constraints. However, most existing learning methods are not applicable for edge LLMs because of their reliance on high resources and low learning capacity. Prompt tuning (PT) has recently emerged as an effective fine-tuning method for edge LLMs by only modifying a small portion of LLM parameters, but it suffers from user domain shifts, resulting in repetitive training and losing resource efficiency. Conventional techniques to address domain shift issues often involve complex neural networks and sophisticated training, which are incompatible for PT for edge LLMs. Therefore, an open research question is how to address domain shift issues for edge LLMs with limited resources. In this paper, we propose a prompt tuning framework for edge LLMs, exploiting the benefits offered by non-volatile computing-in-memory (NVCiM) architectures. We introduce a novel NVCiM-assisted PT framework, where we narrow down the core operations to matrix-matrix multiplication, which can then be accelerated by performing in-situ computation on NVCiM. To the best of our knowledge, this is the first work employing NVCiM to improve the edge LLM PT performance.

data sample, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.08244

Country:

Europe (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Provable Accuracy Bounds for Hybrid Dynamical Optimization and Sampling

Burns, Matthew X., Hou, Qingyuan, Huang, Michael C.

arXiv.org Artificial IntelligenceOct-8-2024

Analog dynamical accelerators (DXs) are a growing sub-field in computer architecture research, offering order-of-magnitude gains in power efficiency and latency over traditional digital methods in several machine learning, optimization, and sampling tasks. However, limited-capacity accelerators require hybrid analog/digital algorithms to solve real-world problems, commonly using large-neighborhood local search (LNLS) frameworks. Unlike fully digital algorithms, hybrid LNLS has no non-asymptotic convergence guarantees and no principled hyperparameter selection schemes, particularly limiting cross-device training and inference. In this work, we provide non-asymptotic convergence guarantees for hybrid LNLS by reducing to block Langevin Diffusion (BLD) algorithms. Adapting tools from classical sampling theory, we prove exponential KL-divergence convergence for randomized and cyclic block selection strategies using ideal DXs. With finite device variation, we provide explicit bounds on the 2-Wasserstein bias in terms of step duration, noise strength, and function parameters. Our BLD model provides a key link between established theory and novel computing platforms, and our theoretical results provide a closed-form expression linking device variation, algorithm hyperparameters, and performance.

algorithm, convergence, langevin monte carlo, (11 more...)

arXiv.org Artificial Intelligence

2410.06397

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Kantō > Tochigi Prefecture > Utsunomiya (0.04)
North America > United States > Washington > Benton County > Richland (0.04)
(5 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.66)

Add feedback

TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators

Qin, Yifan, Yan, Zheyu, Pan, Zixuan, Wen, Wujie, Hu, Xiaobo Sharon, Shi, Yiyu

arXiv.org Artificial IntelligenceMay-8-2024

Compute-in-memory (CIM) accelerators using non-volatile memory (NVM) devices offer promising solutions for energy-efficient and low-latency Deep Neural Network (DNN) inference execution. However, practical deployment is often hindered by the challenge of dealing with the massive amount of model weight parameters impacted by the inherent device variations within non-volatile computing-in-memory (NVCIM) accelerators. This issue significantly offsets their advantages by increasing training overhead, the time needed for mapping weights to device states, energy consumption, and diminishing inference accuracy. To mitigate these challenges, we propose the "Tiny Shared Block (TSB)" method, which integrates a small shared 1x1 convolution block into the DNN architecture. This block is designed to stabilize feature processing across the network, effectively reducing the impact of device variation. Extensive experimental results show that TSB achieves over 20x inference accuracy gap improvement, over 5x training speedup, and weights-to-device mapping cost reduction while requiring less than 0.4% of the original weights to be write-verified during programming, when compared with state-of-the-art baseline solutions. Our approach provides a practical and efficient solution for deploying robust DNN models on NVCIM accelerators, making it a valuable contribution to the field of energy-efficient AI hardware.

accelerator, accuracy, device variation, (14 more...)

arXiv.org Artificial Intelligence

2406.06544

Country:

North America > United States > Indiana > St. Joseph County > Notre Dame (0.05)
Europe (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
Asia (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Hardware (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

Qin, Ruiyang, Yan, Zheyu, Zeng, Dewen, Jia, Zhenge, Liu, Dancheng, Liu, Jianbo, Zheng, Zhi, Cao, Ningyuan, Ni, Kai, Xiong, Jinjun, Shi, Yiyu

arXiv.org Artificial IntelligenceMay-7-2024

Large Language Models (LLMs) deployed on edge devices learn through fine-tuning and updating a certain portion of their parameters. Although such learning methods can be optimized to reduce resource utilization, the overall required resources remain a heavy burden on edge devices. Instead, Retrieval-Augmented Generation (RAG), a resource-efficient LLM learning method, can improve the quality of the LLM-generated content without updating model parameters. However, the RAG-based LLM may involve repetitive searches on the profile data in every user-LLM interaction. This search can lead to significant latency along with the accumulation of user data. Conventional efforts to decrease latency result in restricting the size of saved user data, thus reducing the scalability of RAG as user data continuously grows. It remains an open question: how to free RAG from the constraints of latency and scalability on edge devices? In this paper, we propose a novel framework to accelerate RAG via Computing-in-Memory (CiM) architectures. It accelerates matrix multiplications by performing in-situ computation inside the memory while avoiding the expensive data transfer between the computing unit and memory. Our framework, Robust CiM-backed RAG (RoCR), utilizing a novel contrastive learning-based training method and noise-aware training, can enable RAG to efficiently search profile data with CiM. To the best of our knowledge, this is the first work utilizing CiM to accelerate RAG.

dataset, device variation, profile data, (15 more...)

arXiv.org Artificial Intelligence

2405.047

Country:

Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Negative Feedback Training: A Novel Concept to Improve Robustness of NVCIM DNN Accelerators

Qin, Yifan, Yan, Zheyu, Wen, Wujie, Hu, Xiaobo Sharon, Shi, Yiyu

arXiv.org Artificial IntelligenceDec-15-2023

Compute-in-memory (CIM) accelerators built upon non-volatile memory (NVM) devices excel in energy efficiency and latency when performing Deep Neural Network (DNN) inference, thanks to their in-situ data processing capability. However, the stochastic nature and intrinsic variations of NVM devices often result in performance degradation in DNN inference. Introducing these non-ideal device behaviors during DNN training enhances robustness, but drawbacks include limited accuracy improvement, reduced prediction confidence, and convergence issues. This arises from a mismatch between the deterministic training and non-deterministic device variations, as such training, though considering variations, relies solely on the model's final output. In this work, we draw inspiration from the control theory and propose a novel training concept: Negative Feedback Training (NFT) leveraging the multi-scale noisy information captured from network. We develop two specific NFT instances, Oriented Variational Forward (OVF) and Intermediate Representation Snapshot (IRS). Extensive experiments show that our methods outperform existing state-of-the-art methods with up to a 46.71% improvement in inference accuracy while reducing epistemic uncertainty, boosting output confidence, and improving convergence probability. Their effectiveness highlights the generality and practicality of our NFT concept in enhancing DNN robustness against device variations.

device variation, negative feedback, variation, (14 more...)

arXiv.org Artificial Intelligence

2305.14561

Country:

North America > United States > Indiana > St. Joseph County > Notre Dame (0.05)
North America > United States > North Carolina > Wake County > Raleigh (0.04)

Genre:

Research Report > Promising Solution (0.85)
Research Report > New Finding (0.68)

Industry: Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

U-SWIM: Universal Selective Write-Verify for Computing-in-Memory Neural Accelerators

Yan, Zheyu, Hu, Xiaobo Sharon, Shi, Yiyu

arXiv.org Artificial IntelligenceDec-11-2023

Architectures that incorporate Computing-in-Memory (CiM) using emerging non-volatile memory (NVM) devices have become strong contenders for deep neural network (DNN) acceleration due to their impressive energy efficiency. Yet, a significant challenge arises when using these emerging devices: they can show substantial variations during the weight-mapping process. This can severely impact DNN accuracy if not mitigated. A widely accepted remedy for imperfect weight mapping is the iterative write-verify approach, which involves verifying conductance values and adjusting devices if needed. In all existing publications, this procedure is applied to every individual device, resulting in a significant programming time overhead. In our research, we illustrate that only a small fraction of weights need this write-verify treatment for the corresponding devices and the DNN accuracy can be preserved, yielding a notable programming acceleration. Building on this, we introduce USWIM, a novel method based on the second derivative. It leverages a single iteration of forward and backpropagation to pinpoint the weights demanding write-verify. Through extensive tests on diverse DNN designs and datasets, USWIM manifests up to a 10x programming acceleration against the traditional exhaustive write-verify method, all while maintaining a similar accuracy level. Furthermore, compared to our earlier SWIM technique, USWIM excels, showing a 7x speedup when dealing with devices exhibiting non-uniform variations.

accuracy, u-swim, variation, (11 more...)

arXiv.org Artificial Intelligence

2401.05357

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
North America > United States > New York (0.04)
(6 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Compute-in-Memory based Neural Network Accelerators for Safety-Critical Systems: Worst-Case Scenarios and Protections

Yan, Zheyu, Hu, Xiaobo Sharon, Shi, Yiyu

arXiv.org Artificial IntelligenceDec-11-2023

Emerging non-volatile memory (NVM)-based Computing-in-Memory (CiM) architectures show substantial promise in accelerating deep neural networks (DNNs) due to their exceptional energy efficiency. However, NVM devices are prone to device variations. Consequently, the actual DNN weights mapped to NVM devices can differ considerably from their targeted values, inducing significant performance degradation. Many existing solutions aim to optimize average performance amidst device variations, which is a suitable strategy for general-purpose conditions. However, the worst-case performance that is crucial for safety-critical applications is largely overlooked in current research. In this study, we define the problem of pinpointing the worst-case performance of CiM DNN accelerators affected by device variations. Additionally, we introduce a strategy to identify a specific pattern of the device value deviations in the complex, high-dimensional value deviation space, responsible for this worst-case outcome. Our findings reveal that even subtle device variations can precipitate a dramatic decline in DNN accuracy, posing risks for CiM-based platforms in supporting safety-critical applications. Notably, we observe that prevailing techniques to bolster average DNN performance in CiM accelerators fall short in enhancing worst-case scenarios. In light of this issue, we propose a novel worst-case-aware training technique named A-TRICE that efficiently combines adversarial training and noise-injection training with right-censored Gaussian noise to improve the DNN accuracy in the worst-case scenarios. Our experimental results demonstrate that A-TRICE improves the worst-case accuracy under device variations by up to 33%.

accuracy, device variation, worst-case performance, (11 more...)

arXiv.org Artificial Intelligence

2312.06137

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
North America > United States > New York (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Improving Realistic Worst-Case Performance of NVCiM DNN Accelerators through Training with Right-Censored Gaussian Noise

Yan, Zheyu, Qin, Yifan, Wen, Wujie, Hu, Xiaobo Sharon, Shi, Yiyu

arXiv.org Artificial IntelligenceJul-28-2023

Compute-in-Memory (CiM), built upon non-volatile memory (NVM) devices, is promising for accelerating deep neural networks (DNNs) owing to its in-situ data processing capability and superior energy efficiency. Unfortunately, the well-trained model parameters, after being mapped to NVM devices, can often exhibit large deviations from their intended values due to device variations, resulting in notable performance degradation in these CiM-based DNN accelerators. There exists a long list of solutions to address this issue. However, they mainly focus on improving the mean performance of CiM DNN accelerators. How to guarantee the worst-case performance under the impact of device variations, which is crucial for many safety-critical applications such as self-driving cars, has been far less explored. In this work, we propose to use the k-th percentile performance (KPP) to capture the realistic worst-case performance of DNN models executing on CiM accelerators. Through a formal analysis of the properties of KPP and the noise injection-based DNN training, we demonstrate that injecting a novel right-censored Gaussian noise, as opposed to the conventional Gaussian noise, significantly improves the KPP of DNNs. We further propose an automated method to determine the optimal hyperparameters for injecting this right-censored Gaussian noise during the training process. Our method achieves up to a 26% improvement in KPP compared to the state-of-the-art methods employed to enhance DNN robustness under the impact of device variations.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2307.15853

Country:

North America > United States > North Carolina (0.04)
Europe (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.86)
Law > Civil Rights & Constitutional Law (0.82)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Computing-In-Memory Neural Network Accelerators for Safety-Critical Systems: Can Small Device Variations Be Disastrous?

Yan, Zheyu, Hu, Xiaobo Sharon, Shi, Yiyu

arXiv.org Artificial IntelligenceJul-15-2022

Computing-in-Memory (CiM) architectures based on emerging non-volatile memory (NVM) devices have demonstrated great potential for deep neural network (DNN) acceleration thanks to their high energy efficiency. However, NVM devices suffer from various non-idealities, especially device-to-device variations due to fabrication defects and cycle-to-cycle variations due to the stochastic behavior of devices. As such, the DNN weights actually mapped to NVM devices could deviate significantly from the expected values, leading to large performance degradation. To address this issue, most existing works focus on maximizing average performance under device variations. This objective would work well for general-purpose scenarios. But for safety-critical applications, the worst-case performance must also be considered. Unfortunately, this has been rarely explored in the literature. In this work, we formulate the problem of determining the worst-case performance of CiM DNN accelerators under the impact of device variations. We further propose a method to effectively find the specific combination of device variation in the high-dimensional space that leads to the worst-case performance. We find that even with very small device variations, the accuracy of a DNN can drop drastically, causing concerns when deploying CiM accelerators in safety-critical applications. Finally, we show that surprisingly none of the existing methods used to enhance average DNN performance in CiM accelerators are very effective when extended to enhance the worst-case performance, and further research down the road is needed to address this problem.

accuracy, device variation, perturbation, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3508352.3549360

2207.07626

Country:

North America > United States > California > San Diego County > San Diego (0.05)
Asia (0.04)
Europe (0.04)
Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback